

### **Ringnet Oriented Network on Chip Based On Fpga**

Awantika Juyal

Student, Department Of VLSI Design Uttarakhand Technical University Dehradun, Uttarakhand -248007

Submitted: 10-02-2021

\_\_\_\_\_ Revised: 20-02-2021 \_\_\_\_\_

Accepted: 26-02-2021

ABSTRACT : This paper gives a brief understanding of on chip architecture need and adoption of on chip network from tradition methods. Then based on the analysis an FPGA based NOC system called Ringnet is proposed. Ringnet has a configuration of 16 nodes and have 64 bit data transfer capability which is synchronized by single clock pulse. Ringnet system in a control chip to control 16 nodes or processing elements and transferring the 64 bit data to nodes. The functionality of each node and system is checked by ModelSim 10.0 software. The communication between nodes can be verified by FPGA Virtex-5. This modelling is done in Xilinx 14.2 ISE using VHDL programming language and synthesizing on industrial manufactured FPGA with the selected device 3s500efg320-5.

The hardware and timing parameter are founded by performing synthesizing prove the result as the maximum frequency of 302.9 MHz and maximum memory utilization 4819160 kB.

Keyword: Network on chip, Hardware descriptive Language, Very large scale integration, Field Programmable Gate Array.

#### I. INTRODUCTION

The integration on chip has attained a certain level where the whole system components can be positioned on a single chip. It means all the required components that are needed for any specialized kind of application on a single silicon chip. Implementation of the complete system on single chip has been became possible because of the drastic development in the field of VLSI design and invention of new designing and other tools which provide accuracy in design. So, we can say in simple way SoC's can be described as "a designed an integrated circuit is a collection of multiple stand-alone VLSI design used for a particular application, to provide complete functionality.'

In today's era Network on chip designs are distributed complex systems. As a result it is very hard to design them by hands and satisfy the ever growing hunger of high performance and better quality in new and upcoming SOCs at small die area, effective(less leakage) and power cost. Innovative design tools are therefore of utmost importance just like excess of tools used in modern time for network management and cloud computing. The synthesis like tools is required to ensure formal correctness, best design, low power, low area and effective result for time and delay. The performance evaluation tools are required to ensure high system performance, quality of service, and low latency. The optimization tools are needed to ensure best designs and resolve traffic management and power management problems. This hints to the beginning of NoC design and platforms that can create best designs from high level specifications.

There are three leading commercial providers of such platforms and Network on chip designs: Arm, Arteris and Netspeed systems.

A network on chip is supposed as a group of I/O devices, memory or storage and computational devices which are coupled to one and another of router and switches instead of connected by point to point wires [3]. These on chip devices communicated with each other with the help of data packets which are routed through different networking topologies or as a traditional network manner. From the basic definition it is clear that we need to utilize highly complicated researches and new advance tool from traditionally used computer network to implement network on single chip. Its advantages are forcing researchers of design to take adoption of a chip network for future SoC's.

On-chip network is a system used for organizing different operating modules which are placed in the same chip and they communicate with each other is by using data packets. Sensors RAM and ROM modules, device controller, stand-alone devices, device controller and many other core's combining is the main aim of on chip network, these are placed on silicon substrate. At present, on chip network is favourable development area for



technologies like microprocessor in general and particularly system with single chip. Actually, this process is quite similar as large communication network like telephony in which we first establish direct communication trough wires in between two or more devices then matric switches – crossbars, then relaying switching signal which is NoC in modern time and only after doing these we can perform the transmission of data (in form of data packet) like in case of internet through TCP/IP protocol (which is analog in case of NoC).

Previously, the main focus for improving the performance of the processing element was by increase in frequency but after attaining the certain level of 2GHz, designer and engineers encounter the problem which include like limitation in the used physical material (reducing the size result in reduction of SiO2 layer, very thin silicon layer cannot promise the precise switching of the transistor switching during operation, there are also current leakage which results in crystal overheating, excessive power consumption and also unsuitable operation) and technical aspect (by reducing the size of logic element, we can also reduce the energy or power consumption but with present techniques for lithography and semiconductor material, it become tricky process to achieve the desired result). Consequently, the researchers are looking for another solution to begin with.

With the help of newest innovation done in VLSI, it enable the computer to work at 3.7 to 4 GHz stably. Any additional increase in frequency will result in unwanted effects such as thermic output, which cannot produce satisfactory output value under normal home condition, which require extra system for cooling up to liquid nitrogen.

The pipeline computation and branch prediction technique was the first introduced expensive solution for better performance. The main idea behind pipelines is to divide the execution operation into micro-operation, which was then send to pipelines for computation. The solution for already released devices is rather nontrivial and also decreases the performance of the processor from 9 to 24%. The first batches for microcode firmware processor (Intel) and windows security systems (by Microsoft) had already led to many problems and failures.

After the concept of pipeline computation there came a new idea of parallelization of the calculation based on single packaging of several core integration. For this concept justification was done by Gaspard de prony for task parallelizing, when calculating trigonometric and logarithmic table for ownership of real estate in the end of 18th century. In today's application parallel calculation have been developed in computer technology. Here the joint work of several processors is used for the calculation purpose (up to thousands of processor) this method remains mainframes, till the development of desktop processor based super computer.

In parallelization of computing data, we move our focus from the max possible no. of data of operation done in a second in single execution thread, for executing several threads simultaneously up to two per core we use processor with multiple cores. This idea is effective except one fact that how the parallelization of calculation data performs greatly depends upon how effectively the task can be parallelized and how the developer does it.

These are the main problem. There are three constituents and possible routes to solve it:

1. **Increase clock frequency:** In the present time science has many limitations as increasing the clock frequency rate greater than 5 GHz without any expensive coolant system in the system. The researchers are going in this area but result will not be come out quickly.

2. Specialization and Multiplication of cores operation: Multiplying cores is only constraint for increasing the processors performance, if intercommunication between processors and its derivatives are based on bus architecture, then it consumes itself at the point of 16 cores per crystal. So multiplication of cores are not possible at this point.

3. **Parallel programming methods**: The parallel method in programming now a day is already established but their following issues;

 $\Box$  Firstly, there are many task which cannot be parallelized. Many problems should be solved using sequential methods;

□ Secondly, in programming many things depends on the programmer such as using tool and skills. Small changes can be done by chipmakers like Intel, ADM and Qualcomm are all the trending vendors that are helping all the way they can.

In the field on NoC system and elements there are famous company which had created the hardware and theoretical solution for Intel and Qualcomm which are world known chip vendors.

#### **RING NOC STRUCTURE**

We have shown a block diagram which is designed using ring topology in fig.1; diagram shows arrangement in ring net of 64 nodes. Every node has its own address and the processing



element (PEs). We assume that these connected nodes works at same frequency and operate in synchronous manner.

The working of NoC connected in ring can be understood as follows. There node from N0 to N63 which are addressed by 6 bit data from "000000" to "111111" for total 64 nodes. Let us assume that node N0 can be addressed as or have source\_address "000000", node N1 have the source\_address of "000001" and similarly all the other node can be specified with 6 bits of address and final node N63 can be addressed as "111111". This arrangement shown the full duplex mode, where any node pair can communicate with each other in network. Let us assume that node N0 wants to communicate in network.



Fig.1: Ring Topological structure

There are two addresses for performing the communication between nodes one is source\_address which is used to address the node, who wants to communicate in the system and another is destination\_address which shows the node which is to be communicated. When more than one nodes wants to communicate with single node then for this condition there is First input first output (FIFO) logic [19], which is used to priority of communicating node, based on the structure of node. The data is travelled in the form of data packets, which includes the information about the data, source\_address and destination\_address.

#### DATA PATH ARCHITECTURE

The data path architecture of ring NoC is shown in figure 2. The architecture consist of decoder ( $4 \times 16$ ) at input side, Demultiplexer ( $1 \times 16$ ) or decoder with enable input at output side and token control logic. The decoder provide the D0.....D15 output based on the address A0 A1 A2 A3 corresponding to each node. The address is identified as "0000", "0001"..... "1111" for node 0, node 1 ... node 15.

These nodes are associated with token control logic as data\_in. in real time scenario only one node is capable to communicate to any of other node. Therefore demultiplexer is required at the output end of the data path architecture. It accepts the data from any of the node from node 0 to node 15 based on the address (A0, A1, A2, A3) as given the output D0 – D15 corresponding to node 0 to node 15.



Fig. 2 Block diagram of Ringnet NoC based on FPGA

The control logic is also consist of write and read signal to write the content of source address in to memory and send the data from output port corresponding to destination nodes. The address of the input decoder (4×16) interfaced with the source\_address of control unit and the address of demultiplexer (1×16) is interfaced with destination\_address of token logic. The token logic data\_in is associated with node 0 ... node 15 as output of decoder. The data\_ out is associated with Demultiplexer output D0 – D15 against each node. Clock signal is used to provide 50% duty cycle of its clock signal to the nodes and the Reset is also used to provide zero value for all the nodes as reset initial condition.

#### **II. METHODOLOGY**

Bottom-up and top-down both the design methodology is supported by FPGA design. FPGA hierarchical design partitioning for top-down design methodology and design flows support modular design approaches for bottom-up methodology, this similar process is used in ASIC standard call devices. [1]VHDL/ Verilog HDL is supported in Xilinx design software that are becoming the part of traditional ASIC standard cell design methodology. The design suit software support incremental compilation methodology for team based design which is type of ASICs design.



It is recommended that we school use that hard copy first flow for the designing the ASIC's hard copy which is responsible for achieving higher performance as compare to FPGA. Development of flight information controller design includes following steps for both standard cell ASICs and FPFAs.

- □ Bottom-up or top-down methodology selection
- □ In Xilinx tool coding in RTL
- □ Hardware optimization using Xilinx tool
- □ Simulation and functional verification in ModelSim
- □ Internal and external memory synthesis specification
- □ View synthesis report

Following software development tools are required for implementation in FPGA

- □ Xilinx Vivado software: in semiconductor industry Xilinx has been a leader at business achievement, market and forefront of technology. This tool is used for seeing the RTL (Register transfer logic) schematic and designing the IC. With the help of this tool we can get the required parameter for implementation of chip in the FPGA environment.
- □ Model Simee 5.4 a or 10.0 D of Mentor Graphic Company: Mentor Graphic Company is the first company that combine single kernel simulator (SKS) technology, with unified debugging environment for SystemC, VHDL and Verilog. The combination of analysis environment, industry-leading, SKS performance with the best integrated debugger make Modelsim simulator to go with choice of both FPGA and ASIC design. For adopting the majority of process and tool flow we need to make sure to use best standard and platform.

#### SPARTAN-3E FPGA Synthesis

The FPGA synthesis is the process of program translation and optimization on specific FPGA. After translation, the same code is verified with respect to several inputs in terms of LUTs. In the coding of VHDL, it is possible to optimize the hardware resources, if the hardware usage is more than 100 % of the chosen FPGA.



#### Fig. 3 SPARTAN- 3E FPGA View

Spartan 3E FPGA kit includes peripherals such as keyboard, PS/2 Mouse and LCD screen. The LC has the size 2-line/ 6-character, used to display the output. The keyboard and PS/2 mouse ports can be attached to the FPGA. The Video Graphics Array (VGA) output display port is used to show many programmed images on the PC screen. This encoding is performed by the FPGA via the assistance of the package and program and then the encrypted images can be shown on the screen. The serial data is communicated with the help of two 9pin RS-232 ports connectors from the FPGA board; working on clock oscillator frequency of 50 MHz is the system input clock. The enabling the clock signal to provide the rising edge of clock signal that helps to determine many events captivating place within the FPGA. There are many programs, which are working with respect to clock signal only. Fig 3 shows the view of SPARTAN-3E FPGA.

FPAG has the feature of On-board USB and debugging interface enabled with Spartan-3E kit that supports the programmable file downloaded into FPGA with the help of USB based dumping cable. So, it is identical considerable in the testing of all digital logic HDL programs whether with the help of input switches and eight output discrete LEDs, which are interfaced to FPGA kit blink when an output becomes high.

#### **III. RESULTS ANALYSIS**

In result analysis we have compared the different parameters as slices usage, slice flip-flops, LUTs usage, IoBs, GClk and Memory Usage. Here we have discussed different hardware parameter like no. of slices, flip-flops, memory usage for ROM Decoder, Demultiplexer and the Ringnet NoC logic chip. Table 1 discuss about these parameters of chip. .



| PARAMETE     | DECOD   | DEMULT  | NOC     |
|--------------|---------|---------|---------|
| R            | ER      | IPLEXER | RINGNE  |
|              |         |         | T LOGIC |
| Slices Usage | 9       | 9       | 5433    |
| Slice flip-  | 16      | 16      | 1024    |
| flops        |         |         |         |
| LUTs Usage   | 17      | 17      | 8736    |
| IoBs         | 22      | 23      | 2058    |
| GCLKs        | 1       | 1       | 1       |
| Memory       | 4521560 | 4520728 | 4819160 |
| Usage        | Kb      | kB      | kB      |

#### Table 1 Xilinx hardware parameters for Ringnet (64 bit)

Other parameters like frequency, min period(in ns), min time before clock signal(in ns), max time after clock signal and combinational delay which are software parameters which have discussed in Table 2 for decoder, Demultiplexer and Ringnet logic chip.

| PARAMETE<br>R            | DECO<br>DER | DEMULT<br>IPLEXER | RINGNE<br>T NOC<br>CHIP |
|--------------------------|-------------|-------------------|-------------------------|
| Frequency                | 235         | 235               | 302.9                   |
| (MHz)                    |             |                   |                         |
| Min Period (ns)          | 1.921       | 2.417             | 3.301                   |
| Min time<br>before clock | 3.931       | 4.106             | 7.839                   |
| signal (ns)              |             |                   |                         |
| Max Time                 | 4.368       | 4.368             | 4.380                   |
| after clock              |             |                   |                         |
| signal(ns)               |             |                   |                         |
| Combinational            | 6.89        | 6.897             | 7.839                   |
| Delay (ns)               |             |                   |                         |

 Table 2 Xilinx software parameters for Ringnet (64 bit)

#### **IV. CONCLUSION**

In our report we first give brief description of NoC, NoC architecture and its functional layer, and different NoC topology. There by we proposed a design of Ringnet oriented network on chip based on FPGA. NoC helps in routing, quality-of-service, flow and congestion control and reliability of the networking system. In our design we make a chip which controls the traffic on communication network. Our design is compared with the ref. [2] which is having flip-flops, LUTs and GCLKs as 9595, 5650 and 1 respectively. The no. of flip-flops in our design is less in compare to ref. [2]. The memory utilization depends on the no. of flip-flops. Therefore our design comes in less memory on chip which is synthesized on SPARTAN-3E FPGA. Our design is optimal in them of memory utilization for the Ringnet in configuration to the existing design.

#### REFERENCES

- [1]. Chang, K. C. (1999). Digital systems design with VHDL and synthesis. IEEE computer society press.
- [2] Lee, K., Lee, S. J., & Yoo, H. J. (2006). Low-power network-on-chip for highperformance SoC design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(2), 148-160.
- [3] Hyung Gye Lee and Naehyuck Chang, Umit Y. Ogras and Radu Marculescu "On-Chip Communication Architecture Exploration: A Quantitative Evaluation of Point-to-Point, Bus, and Network-on-Chip Approaches"



ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 3, pp (1-20), August 2007.

- [4]. Kumar, A., Hansson, A., Huisken, J., & Corporaal, H. (2007, April). An FPGA design flow for reconfigurable networkbased multi-processor systems on chip. In 2007 Design, Automation & Test in Europe Conference & Exhibition (pp. 1-6). IEEE.
- [5]. Gindin, R., Cidon, I., & Keidar, I. (2007, May). NoC-based FPGA: architecture and routing. In First International Symposium on Networks-on-Chip (NOCS'07) (pp. 253-264). IEEE
- [6]. Hadjiat, K., St-Pierre, F., Bois, G., Savaria, Y., Langevin, M., & Paulin, P. (2007, December). An FPGA implementation of a scalable network-on-chip based on the token ring concept. In 2007 14th IEEE International Conference on Electronics, Circuits and Systems (pp. 995-998). IEEE.
- [7]. Murali, S., Atienza, D., Meloni, P., Carta, S., Benini, L., De Micheli, G., & Raffo, L. (2007). Synthesis of predictable networkson-chip-based interconnect architectures for chip multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(8), 869-880.
- [8]. Lukovic, S., & Fiorin, L. (2008, June). An automated design flow for NoC-based MPSoCs on FPGA. In 2008 The 19th IEEE/IFIP International Symposium on Rapid System Prototyping (pp. 58-64). IEEE.
- [9]. Bafumba-Lokilo, D., Savaria, Y., & David, J. P. (2008, June). Generic crossbar network on chip for FPGA MPSoCs. In 2008 Joint 6th International IEEE
- [10]. Northeast Workshop on Circuits and Systems and TAISA Conference (pp. 269-272). IEEE
- [11]. Ali, M., Welzl, M., & Zwicknagl, M. (2008, July). Networks on chips: scalable interconnects for future systems on chips. In 2008 4th European Conference on Circuits and Systems for Communications (pp. 240-245). IEEE.
- [12]. Salminen, E., Kulmala, A., & Hamalainen, T. D. (2008). Survey of network-on-chip proposals. white paper, OCP-IP, 1, 13.
- [13]. Bafumba-Lokilo, D., Savaria, Y., & David, J. P. (2008, June). Generic crossbar network on chip for FPGA MPSoCs. In 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference (pp. 269-272). IEEE.

- [14]. Murali, S. (2009). Netchip Tool Flow for NoC Design. In Designing Reliable and Efficient Networks on Chips (pp. 39-42). Springer, Dordrecht.
- [15]. Paliwal, K. K., Gaur, M. S., Laxmi, V., & Janyani, V. (2009, March). Performance analysis of guaranteed throughput and best effort traffic in network-on-chip under different traffic scenario. In 2009 International Conference on Future Networks (pp. 74-78). IEEE.
- [16]. Lan, Y. C., Lin, H. A., Lo, S. H., Hu, Y. H., & Chen, S. J. (2011). A bidirectional NoC (BiNoC) architecture with dynamic selfreconfigurable channel. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(3), 427-440.
- [17]. Millberg, M. (2011). Architectural techniques for improving performance in networks on chip (Doctoral dissertation, KTH Royal Institute of Technology).
- [18]. Łuczak, A., Stępniewski, M., Siast, J., Domański, M., Stankiewicz, O., Kurc, M., & Konieczny, J. (2011). Network-on-Multi-(NoMC) with Chip monitoring and debugging support. Journal of Telecommunications Information and Technology, 81-86.
- [19]. Hoefflinger, B. (2011). ITRS: The international technology roadmap for semiconductors. In Chips 2020 (pp. 161-174). Springer, Berlin, Heidelberg.
- [20]. Kumar, A., Baruah, L., & Sabu, A. (2015). Rotator on Chip (RoC) design based on ring topological NoC. Procedia Computer Science, 45, 540-548.

DOI: 10.35629/5252-0302667672 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 672

## International Journal of Advances in Engineering and Management ISSN: 2395-5252

# IJAEM

Volume: 03

Issue: 02

DOI: 10.35629/5252

www.ijaem.net

Email id: ijaem.paper@gmail.com